We are dedicated to providing outstanding customer service and being reachable at all times.
At a glance:
Nanopore sequencers generate raw electrical signals that encode RNA sequence information. In direct RNA sequencing workflows, these signals are written as high volume binary files. Most modern instruments and pipelines use the POD5 format for storage and transfer. POD5 supports streaming writes and efficient random access during downstream analysis. Correct handling of these files is mission critical for reliable results and stable pipelines. This article explains practical steps, provides command examples, and includes output snippets. You will learn how to inspect, merge, filter, subset, repack, and convert signal files. We also cover performance tuning, quality control, troubleshooting, and workflow integration. Each section uses clear language for scientists and engineers who manage sequencing projects.
Service you may interested in
POD5 replaces legacy FAST5 for most production environments today. The format couples compact storage with reliable metadata indexing. It enables streaming from the acquisition software to persistent storage. That behavior reduces temporary bottlenecks and minimizes partial writes. POD5 relies on a columnar memory model that accelerates analytics. Fast reads make integrity checks and targeted extraction practical at scale. Large projects benefit because parallel workers can access distinct chunks safely. Service providers value predictable throughput and simpler file lifecycle management.
Install the POD5 toolkit using Python packaging. Use virtual environments for isolation.
pip install pod5
Confirm that the command line interface is available and versioned.
pod5 --version
Record the toolkit version in your run logs and analysis notebooks.
Start with quick summaries that surface obvious problems before basecalling.
Use pod5 view to build a compact table containing essential fields only.
pod5 view input.pod5 --include "read_id,channel,num_samples,end_reason" --output summary.tsv --separator "\t"
Typical output shows read identifiers, channels, sample counts, and end reasons.
read_id channel num_samples end_reason
00000000-0000-0000-0000-000000000001 23 45000 COMPLETE
00000000-0000-0000-0000-000000000002 24 45210 UNBLOCK
Inspect global integrity metrics and logs with the summary mode.
pod5 inspect summary input.pod5
Drill into individual reads when specific anomalies require deeper review.
pod5 inspect read input.pod5 00000000-0000-0000-0000-000000000001
Capture screenshots to document anomalies and share them with collaborators.
Consistent output archives make regression analysis fast during method updates.
Merging simplifies downstream scheduling when many POD5 fragments exist.
pod5 merge *.pod5 -o merged.pod5 --duplicate-ok
Filtering extracts reads of interest using a deterministic list of identifiers.
pod5 filter input.pod5 --output filtered.pod5 --ids read_ids.txt
Subsetting creates groups by barcode or quality status for organized processing.
pod5 subset -s sequencing_summary.txt --columns pod5 barcode pod5/ --template pod5_{pod5}/{barcode}/{pod5}.{barcode}.pod5
Repacking improves I O patterns and reduces fragmentation in heavy pipelines.
pod5 repack pod5s/*.pod5 repacked_pods/
Convert between formats when legacy tools require FAST5 inputs or outputs.
pod5 convert fast5 ./fast5/ --output pod5/ --one-to-one ./fast5/
Produce FAST5 from POD5 when specific utilities remain unported to POD5.
pod5 convert to_fast5 input.pod5 --output fast5/
Signal pipelines stress storage and compute concurrently. Plan resources carefully.
Create dashboards that track throughput, error rates, and queue depth over time.
Share weekly performance reports with stakeholders to align on capacity upgrades.
The following sequence demonstrates a compact intake routine for one run.
# Summaries
pod5 view run1.pod5 --include "read_id,channel,num_samples" > run1_summary.tsv
# Integrity logs
pod5 inspect summary run1.pod5 > run1_integrity.log
# Merge shards
pod5 merge run1_barcode01.pod5 run1_barcode02.pod5 -o run1_merged.pod5
# Repack for performance
pod5 repack run1_merged.pod5 repacked_run1/
# Convert for legacy tools
pod5 convert to_fast5 repacked_run1/run1_merged.pod5 --output fast5_out/
Representative output snippets are included for documentation and training.
POD5 file version: 0.3.28
Reads: 1245678
Channels: 512
Integrity: OK
read_id channel num_samples
00000000-0000-0000-0000-000000000001 23 47892
00000000-0000-0000-0000-000000000002 24 48010
Maintain a runbook that documents symptoms, root causes, and durable fixes.
Share lessons across teams to reduce repeated investigation time during sprints.
Once POD5 files have been processed and quality checked, the next step in Direct RNA Sequencing workflows is visualizing the raw electrical signal. Visualization bridges machine output and human interpretation, helping to validate basecalling, detect motif-associated signal patterns, and explore RNA modifications. Squigualiser is one of the most widely used tools for this purpose.
Option 1. Precompiled binary release:
wget https://github.com/hiruna72/squigualiser/releases/download/squigualiser-v0.6.1/squigualiser-v0.6.1-linux-x86-64-binaries.tar.gz -O squigualiser.tar.gz
tar xf squigualiser.tar.gz
cd squigualiser
./squigualiser --help
Option 2. Python installation via pip:
pip install squigualiser
Test the installation with sample data:
wget https://hiruna72.github.io/squigualiser/docs/sample_dataset.tar.gz
tar xf sample_dataset.tar.gz
squigualiser plot_pileup -f ref.fasta -s reads.blow5 -a eventalign.bam -o dir_out --region chr1:92,778,040-92,782,120 --tag_name "test_0"
Step 1. Basecall with Dorado using --emit-moves:
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 input.pod5 --emit-moves > basecalls.bam
Step 2. Reform BAM for plotting:
squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam basecalls.bam -o reform_output.paf
Step 3. Extract sequences for alignment:
samtools fasta basecalls.bam > pass.fasta
Step 4. Align sequences to the reference genome:
minimap2 -t 16 -ax map-ont ref.fa pass.fa > mapped.bam
Step 5. Convert POD5 to SLOW5/BLOW5 format:
blue-crab p2s input.pod5 -o input.blow5
Step 6. Plot signal–read graphs:
squigualiser plot --file pass.fasta --slow5 input.blow5 --alignment mapped.bam
The generated plots display:
- X-axis: nucleotide positions, color-coded by base.
- Y-axis: current intensity values.
- Multiple aligned reads stacked together to reveal consistent patterns or deviations.
This visualization is valuable for validating new basecalling models, identifying motif-linked artifacts, or training new researchers.
By combining POD5 file management with Squigualiser visualization, researchers ensure both technical integrity and intuitive confirmation of their sequencing data. Clean, repacked files reduce computational noise, while signal-level plots highlight whether basecalling and modification signatures are reliable. This workflow forms the foundation for downstream RNA modification detection and differential methylation analysis.
Not always. Repack large, fragmented sets or when throughput drops during GPU usage.
Keep originals until validation completes and checksums match across every audit step.
You can, yet it adds complexity. Standardize on POD5 for new work and maintain a small FAST5 bridge for legacy steps.
Track end reasons, sample counts, and channel usage. Add alarms for anomalies that deviate from historical baselines.
Use prior runs to model per hour growth, include replication overhead, and budget headroom for reprocessing.
References
Recommend reading
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment